home *** CD-ROM | disk | FTP | other *** search
-
-
-
- rrrreeeeggggeeeexxxxpppp((((nnnn)))) TTTTccccllll (((( )))) rrrreeeeggggeeeexxxxpppp((((nnnn))))
-
-
-
- _________________________________________________________________
-
- NNNNAAAAMMMMEEEE
- regexp - Match a regular expression against a string
-
- SSSSYYYYNNNNOOOOPPPPSSSSIIIISSSS
- rrrreeeeggggeeeexxxxpppp ?_s_w_i_t_c_h_e_s? _e_x_p _s_t_r_i_n_g ?_m_a_t_c_h_V_a_r? ?_s_u_b_M_a_t_c_h_V_a_r
- _s_u_b_M_a_t_c_h_V_a_r ...?
- _________________________________________________________________
-
-
- DDDDEEEESSSSCCCCRRRRIIIIPPPPTTTTIIIIOOOONNNN
- Determines whether the regular expression _e_x_p matches part
- or all of _s_t_r_i_n_g and returns 1 if it does, 0 if it doesn't.
-
- If additional arguments are specified after _s_t_r_i_n_g then they
- are treated as the names of variables in which to return
- information about which part(s) of _s_t_r_i_n_g matched _e_x_p.
- _M_a_t_c_h_V_a_r will be set to the range of _s_t_r_i_n_g that matched all
- of _e_x_p. The first _s_u_b_M_a_t_c_h_V_a_r will contain the characters
- in _s_t_r_i_n_g that matched the leftmost parenthesized
- subexpression within _e_x_p, the next _s_u_b_M_a_t_c_h_V_a_r will contain
- the characters that matched the next parenthesized
- subexpression to the right in _e_x_p, and so on.
-
- If the initial arguments to rrrreeeeggggeeeexxxxpppp start with ---- then they |
- are treated as switches. The following switches are |
- currently supported: |
-
- ----nnnnooooccccaaaasssseeee ||
- Causes upper-case characters in _s_t_r_i_n_g to be |
- treated as lower case during the matching process. |
-
- ----iiiinnnnddddiiiicccceeeessss ||
- Changes what is stored in the _s_u_b_M_a_t_c_h_V_a_rs. |
- Instead of storing the matching characters from |
- ssssttttrrrriiiinnnngggg, each variable will contain a list of two |
- decimal strings giving the indices in _s_t_r_i_n_g of |
- the first and last characters in the matching |
- range of characters. |
-
- -------- ||
- Marks the end of switches. The argument following |
- this one will be treated as _e_x_p even if it starts |
- with a ----....
-
- If there are more _s_u_b_M_a_t_c_h_V_a_r's than parenthesized
- subexpressions within _e_x_p, or if a particular subexpression
- in _e_x_p doesn't match the string (e.g. because it was in a
- portion of the expression that wasn't matched), then the
- corresponding _s_u_b_M_a_t_c_h_V_a_r will be set to ``----1111 ----1111'' if
- ----iiiinnnnddddiiiicccceeeessss has been specified or to an empty string otherwise.
-
-
-
- Page 1 (printed 7/17/95)
-
-
-
-
-
-
- rrrreeeeggggeeeexxxxpppp((((nnnn)))) TTTTccccllll (((( )))) rrrreeeeggggeeeexxxxpppp((((nnnn))))
-
-
-
- RRRREEEEGGGGUUUULLLLAAAARRRR EEEEXXXXPPPPRRRREEEESSSSSSSSIIIIOOOONNNNSSSS
- Regular expressions are implemented using Henry Spencer's
- package (thanks, Henry!), and much of the description of
- regular expressions below is copied verbatim from his manual
- entry.
-
- A regular expression is zero or more _b_r_a_n_c_h_e_s, separated by
- ``|''. It matches anything that matches one of the
- branches.
-
- A branch is zero or more _p_i_e_c_e_s, concatenated. It matches a
- match for the first, followed by a match for the second,
- etc.
-
- A piece is an _a_t_o_m possibly followed by ``*'', ``+'', or
- ``?''. An atom followed by ``*'' matches a sequence of 0 or
- more matches of the atom. An atom followed by ``+'' matches
- a sequence of 1 or more matches of the atom. An atom
- followed by ``?'' matches a match of the atom, or the null
- string.
-
- An atom is a regular expression in parentheses (matching a
- match for the regular expression), a _r_a_n_g_e (see below),
- ``.'' (matching any single character), ``^'' (matching the
- null string at the beginning of the input string), ``$''
- (matching the null string at the end of the input string), a
- ``\'' followed by a single character (matching that
- character), or a single character with no other significance
- (matching that character).
-
- A _r_a_n_g_e is a sequence of characters enclosed in ``[]''. It
- normally matches any single character from the sequence. If
- the sequence begins with ``^'', it matches any single
- character _n_o_t from the rest of the sequence. If two
- characters in the sequence are separated by ``-'', this is
- shorthand for the full list of ASCII characters between them
- (e.g. ``[0-9]'' matches any decimal digit). To include a
- literal ``]'' in the sequence, make it the first character
- (following a possible ``^''). To include a literal ``-'',
- make it the first or last character.
-
-
- CCCCHHHHOOOOOOOOSSSSIIIINNNNGGGG AAAAMMMMOOOONNNNGGGG AAAALLLLTTTTEEEERRRRNNNNAAAATTTTIIIIVVVVEEEE MMMMAAAATTTTCCCCHHHHEEEESSSS
- In general there may be more than one way to match a regular
- expression to an input string. For example, consider the
- command
-
- rrrreeeeggggeeeexxxxpppp ((((aaaa****))))bbbb**** aaaaaaaabbbbaaaaaaaaaaaabbbbbbbb xxxx yyyy
- Considering only the rules given so far, xxxx and yyyy could end
- up with the values aaaaaaaabbbbbbbb and aaaaaaaa, aaaaaaaaaaaabbbb and aaaaaaaaaaaa, aaaabbbb and aaaa, or
- any of several other combinations. To resolve this
- potential ambiguity rrrreeeeggggeeeexxxxpppp chooses among alternatives using
-
-
- Page 2 (printed 7/17/95)
-
-
-
-
-
-
- rrrreeeeggggeeeexxxxpppp((((nnnn)))) TTTTccccllll (((( )))) rrrreeeeggggeeeexxxxpppp((((nnnn))))
-
-
-
- the rule ``first then longest''. In other words, it
- consders the possible matches in order working from left to
- right across the input string and the pattern, and it
- attempts to match longer pieces of the input string before
- shorter ones. More specifically, the following rules apply
- in decreasing order of priority:
-
- [1] If a regular expression could match two different parts
- of an input string then it will match the one that
- begins earliest.
-
- [2] If a regular expression contains |||| operators then the
- leftmost matching sub-expression is chosen.
-
- [3] In ****, ++++, and ???? constructs, longer matches are chosen in
- preference to shorter ones.
-
- [4] In sequences of expression components the components
- are considered from left to right.
-
- In the example from above, ((((aaaa****))))bbbb**** matches aaaaaaaabbbb: the ((((aaaa****))))
- portion of the pattern is matched first and it consumes the
- leading aaaaaaaa; then the bbbb**** portion of the pattern consumes the
- next bbbb. Or, consider the following example:
-
- rrrreeeeggggeeeexxxxpppp ((((aaaabbbb||||aaaa))))((((bbbb****))))cccc aaaabbbbcccc xxxx yyyy zzzz
- After this command xxxx will be aaaabbbbcccc, yyyy will be aaaabbbb, and zzzz will
- be an empty string. Rule 4 specifies that ((((aaaabbbb||||aaaa)))) gets first
- shot at the input string and Rule 2 specifies that the aaaabbbb
- sub-expression is checked before the aaaa sub-expression. Thus
- the bbbb has already been claimed before the ((((bbbb****)))) component is
- checked and ((((bbbb****)))) must match an empty string.
-
-
- KKKKEEEEYYYYWWWWOOOORRRRDDDDSSSS
- match, regular expression, string
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- Page 3 (printed 7/17/95)
-
-
-
-